Random Forests Feature Selection with K-PLS: Detecting Ischemia from Magnetocardiograms

نویسندگان

  • Long Han
  • Mark J. Embrechts
  • Boleslaw K. Szymanski
  • Karsten Sternickel
  • Alexander Ross
چکیده

Random Forests were introduced by Breiman for feature (variable) selection and improved predictions for decision tree models. The resulting model is often superior to AdaBoost and bagging approaches. In this paper the random forests approach is extended for variable selection with other learning models, in this case Partial Least Squares (PLS) and Kernel Partial Least Squares (K-PLS) to estimate the importance of variables. This variable selection method is demonstrated on two benchmark datasets (Boston Housing and South African heart disease data). Finally, this methodology is applied to magnetocardiogram data for the detection of ischemic heart disease. 1 Partial Least Squares (PLS) and K-PLS Partial Least Squares Regression (PLS) was introduced by Herman Wold [1] for econometrics modeling of multi-variate time series. PLS can be viewed as a “better” Principal Components Analysis (PCA) regression method, where the data are first projected into a different and non-orthogonal basis, and only the most important PLS components (or latent variables) are considered for building a regression model (similar to PCA). The difference between PLS and PCA is that the new set of basis vectors in PLS is not a set of successive orthogonal directions that explain the largest variance in the data, but are actually a set of conjugant gradient vectors to the correlation matrix. The NIPALS implementation of PLS [2] is elegant and fast. Rosipal introduced K-PLS in 2001 [3] as a nonlinear extension to the linear PLS method instead of using linear kernel K-PLS [4]. This nonlinear extension of PLS makes K-PLS a powerful machine learning tool for classification as well as regression. Powerful variable selection methods have been implemented for PLS and K-PLS, and unlike SVMs, multiple output models are easy to implement. K-PLS can also be formulated as a paradigm closely related (and almost identical) [5] to Support Vector Machines (SVM) [6, 7]. K-PLS uses the same kernel trick as is commonly used in SVMs. K-PLS also provides a purely statistical method, that has been widely used in chemometrics during the past decade. In addition, the idea of using of K-PLS rather than SVMs can be motivated on several levels: (i) PLS is the method by choice in chemometrics and drug design, and K-PLS is a natural extension to PLS; (ii) K-PLS results are generally ESANN'2006 proceedings European Symposium on Artificial Neural Networks Bruges (Belgium), 26-28 April 2006, d-side publi., ISBN 2-930307-06-4.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random Forests Feature Selection with Kernel Partial Least Squares: Detecting Ischemia from MagnetoCardiograms

Random Forests were introduced by Breiman for feature (variable) selection and improved predictions for decision tree models. The resulting model is often superior to Adaboost and bagging approaches. In this paper the random forest approach is extended for variable selection with other learning models, in this case partial least squares (PLS) and kernel partial least squares (K-PLS) to estimate...

متن کامل

Sigma Tuning of Gaussian Kernels: Detection of Ischemia from Magnetocardiograms

This chapter introduces a novel Levenberg-Marquardt like second-order algorithm for tuning the Parzen window σ in a Radial Basis Function (Gaussian) kernel. In this case, each attribute has its own sigma parameter associated with it. The values of the optimized σ are then used as a gauge for variable selection. In this study, the Kernel Partial Least Squares (K-PLS) model is applied to several ...

متن کامل

Random Forests-based Feature Selection for Land-use Classification Using Lidar Data and Orthoimagery

The development of lidar system, especially incorporated with high-resolution camera components, has shown great potential for urban classification. However, how to automatically select the best features for land-use classification is challenging. Random Forests, a newly developed machine learning algorithm, is receiving considerable attention in the field of image classification and pattern re...

متن کامل

Rapid Feature Selection Based on Random Forests for High-Dimensional Data

One of the important issues of machine learning is obtaining essential information from high-dimensional data for discrimination. Dimensionality reduction is a means to reduce the burden of dimensionality due to large-scale data. Feature selection determines significant variables and contributes to dimensionality reduction. In recent years, the random forests method has been the focus of resear...

متن کامل

Airborne Lidar Feature Selection for Urban Classification Using Random Forests

Various multi-echo and Full-waveform (FW) lidar features can be processed. In this paper, multiple classifers are applied to lidar feature selection for urban scene classification. Random forests are used since they provide an accurate classification and run efficiently on large datasets. Moreover, they return measures of variable importance for each class. The feature selection is obtained by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006